Automatic Performance Tuning (Autotuning)

نویسندگان

  • James Demmel
  • Sam Williams
  • Katherine Yelick
چکیده

Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations — a class of algorithms at the heart of many structured grid codes, including PDE solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural trade-offs of emerging multicore designs and their implications on scientific algorithm development.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Special issue on automatic application tuning for HPC architectures

High Performance Computing architectures have become incredibly complex and exploiting their full potential is becoming more and more challenging. As a consequence, automatic performance tuning (autotuning) of HPC applications is of growing interest and many research groups around the world are currently involved. Autotuning is still a rapidly evolving research field with many different approac...

متن کامل

Performance Limitation and Autotuning of Inverse Optimal PID Controller for Lagrangian Systems

The PID trajectory tracking controller for Lagrangian systems shows performance limitation imposed by inverse dynamics according to desired trajectory. Since the equilibrium point cannot be defined for the control system involving performance limitation, we define newly the quasiequilibrium region as an alternative for equilibrium point. This analysis result of performance limitation can guide ...

متن کامل

PERI Autotuning of PFLOTRAN

In response to the enormous and growing complexity of today’s high-end systems, the Performance Engineering Research Institute (PERI) is working toward automating portions of the performance tuning process by developing an autotuning framework. Our framework employs empirical techniques to identify the best-performing version of a computation among a search space of possible implementations. Th...

متن کامل

Piecewise Holistic Autotuning of Compiler and Runtime Parameters

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small p...

متن کامل

Autotuning Aspects for Dynamic Positioning Systems

This paper considers aspects related to the problem of automatic tuning (autotuning) of dynamic positioning (DP) controllers for marine surface vessels. Autotuning involves automatic adjustment of controller parameters for shifting vessel operational conditions (VOCs), which encompass changing vessel dynamics and different operational tasks. A controller with fixed gains cannot behave equally w...

متن کامل

Piecewise holistic autotuning of parallel programs with CERE

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve best performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring long iterative evaluations. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces calle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013